Earlier this year the water began to recede on government data in the United States with President Obama’s announcement of an unprecedented push toward further transparency in the federal government. But with the rush of new data comes the challenge of making sense of it all — something admittedly still in its formative stages.
By June of 2009 the nation’s Chief Information Officer, Vivek Kundra, had overseen the launch of data.gov with the goal of increasing public access to machine readable datasets produced by the federal government. Other government sites such as usaspending.gov and recovery.gov have since been launched to provide even more focused data on how the U.S. spends its taxpayers’ dollars.
A promise of increased participation
The promise of data.gov and of many of these other civic data collections is in allowing citizens to participate in the scrutiny of their government and of society at large, opening vast stores of data for examination by anyone with the interest and patience to do so. Open source civic data analysis, if you will.
The array of data available on data.gov is still sparse in some areas but has steadily grown to include things ranging from residential energy consumption, to patent applications, to national water quality data among others.
And the data transparency movement isn’t just federal anymore: State and local municipalities such as California and New York City are following suit with pledges to make more civic data available.
The benefits of the open data movement are also starting to be recognized throughout Europe. The U.K. has called on Sir Tim Berners-Lee, the inventor of the world wide web, to lead a similar government data transparency effort there, which should soon result in a data.gov analogue. And indications of movements are beginning to stir in Germany (link in German).
Beyond Data Scraping
In the U.S. various government data resources have been available in some form or another online for years now. Programmers could scrape these online data sources by writing custom parsers to scan webpages and create their own databases. And many journalist-programmers working in today’s modern newsrooms still do. But it’s messy, it doesn’t scale or extend well, it’s brittle, and ultimately the data that results may not interoperate well with other data.
Having government buy-in to the publication of organized and structured data lowers the barriers substantially for developers and others to get involved with analyzing that data. It also means that structured formats, such as those that conform to semantic Web standards can interoperate more easily and be utilized to build ever more complex applications on top of the data.
Data.gov ≠ Insight.gov
So now that the U.S. government is publishing all kinds of data online, society will be better, right? Well – maybe. Let’s not forget that data has a long way to go before it becomes the information and knowledge that can ultimately impact back on policy.
Some non-governmental organizations are pushing data to become information by incentivizing contests with big prizes. For instance, the Apps for America 2 contest, coordinated by Sunlight Labs, awarded a total of $25,000 to the top application submissions which made data.gov data more transparent and accessible for citizens.
These efforts at coordinating developers and stimulating application development around government data are vital, no doubt. The applications which result typically involve polished interfaces and visuals which make it much easier for people to search, browse, and mashup the data.
Take for example the Apps for America 2 winner, DataMasher, which lets users create national heat maps by crossing two datasets (either adding, subtracting, dividing, or multiplying values). These operations, however, can’t show correlation, and at best they can only show outliers. As one anonymous commenter put it:
I don’t get it. It shows violent crime times poverty. So these are either poor, or violent, or both? I don’t think multiplying the two factors is very enlightening.
What we end up with is that many of the possible combinations of datasets lead to downright pointless maps which add little if any information to a discourse about those datasets.
Data.gov and indeed many of the applications built around it somehow fall short of the mark in terms of helping people share and build on the insights of others – to produce information. It’s not simply that we need interfaces to data, we also need ways to collaboratively make sense of that data.
The Minnesota Employment Explorer was an early foray into helping people collaboratively make sense of government data. It not only visualizes employment information but also allows people to ask questions and build on the insights of others looking at the visuals in order to make sense of the data. In the long run it’s these kinds of sensemaking tools that will really unlock to potential of the datasets published by the government.
What’s Next?
With a long tradition of making sense of the complex, there’s a unique opportunity for the institution of journalism to play a leadership role here. Journalists can leverage their experience and expertise with storytelling to provide structured and comprehensive explorations of datasets as well as context for the interpretation of data via these applications. Moreover, journalists can focus the efforts and attention of interested citizens to channel the sensemaking process.
I’ll suggest four explicit ways forward here:
(1) that data-based applications be built with an understanding of trying to promote information and insight rather than simply be database widgets,
(2) that journalists should be leaders (but still collaborators with the public) in this sensemaking enterprise,
(3) that these applications incorporate the ability to aggregate insights around whatever visual interface is being presented, and
(4) that data.gov or other governmental data portals should collect and show trackback links to all applications pulling from its various datasets.
And finally, after we all figure out how to make sense of all this great new data, lies the question of whether government is even “listening” to these applications. Is the federal government prepared to accept or adopt the insight of its constituents’ data analysis into policy?
[…] Kommenter! De økende mengdene offentlig data som blir gjort tilgjengelige gir stor muligheter for både økt brukerinvolvering og nye måter å drive journalistikk på, skriver Nicholas Diakopoulos i en artikkel for nettmagasinet Vox Publica. […]